SoF: Soft-Cluster Matrix Factorization for Probabilistic Clustering
نویسندگان
چکیده
We propose SoF (Soft-cluster matrix Factorization), a probabilistic clustering algorithm which softly assigns each data point into clusters. Unlike model-based clustering algorithms, SoF does not make assumptions about the data density distribution. Instead, we take an axiomatic approach to define 4 properties that the probability of co-clustered pairs of points should satisfy. Based on the properties, SoF utilizes a distance measure between pairs of points to induce the conditional co-cluster probabilities. The objective function in our framework establishes an important connection between probabilistic clustering and constrained symmetric Nonnegative Matrix Factorization (NMF), hence providing a theoretical interpretation for NMF-based clustering algorithms. To optimize the objective, we derive a sequential minimization algorithm using a penalty method. Experimental results on both synthetic and real-world datasets show that SoF significantly outperforms previous NMF-based algorithms and that it is able to detect non-convex patterns as well as cluster boundaries.
منابع مشابه
Local quality functions for graph clustering with non-negative matrix factorization.
Many graph clustering quality functions suffer from a resolution limit, namely the inability to find small clusters in large graphs. So-called resolution-limit-free quality functions do not have this limit. This property was previously introduced for hard clustering, that is, graph partitioning. We investigate the resolution-limit-free property in the context of non-negative matrix factorizatio...
متن کاملHINMF: A Matrix Factorization Method for Clustering in Heterogeneous Information Networks
Non-negative matrix factorization (NMF) has become quite popular recently on the relational data due to its several nice properties and connection to probabilistic latent semantic analysis (PLSA). However, few algorithms take this route for the heterogeneous networks. In this paper we propose a novel clustering method for heterogeneous information networks by searching for a factorization that ...
متن کاملMulti-way clustering of microarray data using probabilistic sparse matrix factorization
MOTIVATION We address the problem of multi-way clustering of microarray data using a generative model. Our algorithm, probabilistic sparse matrix factorization (PSMF), is a probabilistic extension of a previous hard-decision algorithm for this problem. PSMF allows for varying levels of sensor noise in the data, uncertainty in the hidden prototypes used to explain the data and uncertainty as to ...
متن کاملNMF-based Models for Tumor Clustering: A Systematic Comparison
Nonnegative Matrix Factorization (NMF) is one of the famous unsupervised learning models. In this paper, we give a short survey on NMF-related models, including K-means, Probabilistic Latent Semantic Indexing etc. and present a new Posterior Probabilistic Clustering model, and compare their numerical experimental results on five real microarray data. The results show that i) NMF using with K-L ...
متن کاملProbabilistic sparse matrix factorization with an application to discovering gene functions in mouse mRNA expression data
We address the problem of sparse matrix factorization using a generative model. Our algorithm, probabilistic sparse matrix factorization (PSMF), is a probabilistic extension of a previous hard-decision algorithm for this problem (Srebro and Jaakkola 2001). PSMF allows for varying levels of sensor noise in the data, uncertainty in the hidden prototypes used to explain the data, and uncertainty a...
متن کامل